Journal of Computer Applications

    Next Articles

Bias challenges of large language models: identification, evaluation, and mitigation

XU Yuemei1, YE Yuqi2, HE Xueyi1   

  1. 1. School of Information Science and Technology, Beijing Foreign Studies University 2. International Business School, Beijing Foreign Studies University
  • Received:2024-09-23 Revised:2024-12-09 Online:2024-12-24 Published:2024-12-24
  • Supported by:
    National Social Science Foundation (24CYY107); Humanities and Social Sciences Project of Ministry of Education (22YJA630018); Chinese Information Society SMP Zhipu AI Large Model Interdisciplinary Fund; Fundamental Research Funds for Central Universities (2024TD001)

大语言模型的偏见挑战:识别、评估与去偏

徐月梅1,叶宇齐2,何雪怡1   

  1. 1. 北京外国语大学 信息科学技术学院 2. 北京外国语大学 国际商学院
  • 通讯作者: 徐月梅
  • 作者简介:徐月梅(1985—),女,广西梧州人,副教授,博士,主要研究方向:跨语言自然语言处理;叶宇齐(2000—),女,四川眉山人,硕士研究生,主要研究方向:自然语言处理;何雪怡(2003—),女,四川绵阳人,主要研究方向:自然语言处理。
  • 基金资助:
    国家社科基金资助项目(24CYY107);教育部人文社科项目(22YJA630018);中文信息学会SMP-智谱AI大模型交叉学科基金;中央高校基本科研业务费专项(2024TD001)

Abstract: Given the safety and ethical concerns arising from bias in Large Language Model (LLM), this review provides a comprehensive analysis of current research, techniques, and limitations related to bias in LLMs. The analysis is structured around three key aspects: bias identification, evaluation, and mitigation of bias. Firstly, three key techniques of LLM were examined to uncover the fundamental causes of intrinsic bias. Secondly, the types of bias present in LLM were categorized into linguistic bias, demographic bias, and evaluation bias, with characteristics and underlying causes explored. Thirdly, a systematic review of existing bias evaluation benchmarks was provided, discussing the strengths and weaknesses of the general-purpose, language-specific, and task-specific benchmarks. Finally, current bias mitigation techniques were analyzed, focusing on both model debiasing and data debiasing approaches while highlighting directions for future refinement. The findings highlights the research directions for bias in LLMs: evaluating bias from a multicultural perspective, developing lightweight bias mitigation techniques, and enhancing the interpretability of bias.

Key words:  large language model, bias tracing, bias identification, bias evaluation, bias mitigation

摘要: 针对大语言模型(LLM)输出内容存在偏见而导致其不安全和不可控的问题,从偏见识别、偏见评估和偏见去除3个角度,深入梳理和分析现有大语言模型偏见的研究现状、技术与局限。首先,概述大语言模型的三大关键技术,从中分析其不可避免存在内隐偏见(Intrinsic Bias)的根本原因;其次,总结现有大语言模型存在的偏见类型,分为语言偏见、人口偏见和评估偏见,分析这些偏见的特点和原因;再次,系统回顾现有大语言模型偏见的评估基准,探讨这些通用型评估基准、特定语言评估基准、和特定任务评估基准的优点及局限;最后,从模型去偏和数据去偏两个角度对现有大语言模型偏见的去除技术进行深入分析,指出其改进方向。分析指出大语言模型偏见研究的三个方向:偏见的多文化属性评估、轻量级的偏见去除技术以及增强偏见的可解释性。

关键词: 大语言模型, 偏见溯源, 偏见识别, 偏见评估, 偏见去除

CLC Number: